Note: This lecture will focus only on vector-based spatial analysis. We will not cover raster-based spatial analysis, although this is an equally important subject. I’ll provide some links to further resources at the bottom of this document for those of you who want to explore further on your own.

Requirements

External libraries (requirements vary by OS)

We’re going to be doing all our spatial analysis and plotting today in R. Behind the scenes, R provides bindings to powerful open-source GIS libraries. These include the Geospatial Data Abstraction Library (GDAL) and Interface to Geometry Engine Open Source (GEOS) API suite, as well as access to projection and transformation operations from the PROJ.4 library. You needn’t worry about all this, but for the fact that you may need to install some of these external libraries first. The requirements vary by OS:

  • Linux: Requirements vary by distribution. See here.
  • Mac: You should be fine to proceed directly to the R packages installation below. An unlikely exception is if you’ve configured R to install packages from source; in which case see here.
  • Windows: Same as Mac, you should be good to go unless you’re installing from source. In which case, see here.

R packages

  • New: sf, lwgeom, maps, mapdata, spData, tigris, tidycensus, leaflet, tmap, tmaptools
  • Already used: tidyverse, hrbrthemes

Note that you’ll need ggplot2 version 3.0.0 or above for the required plotting support of sf objects. That’s almost certainly the case, but now is a good time to upgrade you existing packages if you haven’t done that for a while.

Okay, let’s install (if necessary) and load everything. As per usual, I’ll also set my preferred plotting theme, but this is unnecessary.

Census API key

Finally, we’ll be accessing some data from the US Census Bureau through the tidycensus package. This will require a Census API key, which you can request here. Once that’s done, you can set it using the tidycensus::census_api_key() function. I recommend using the “install = T” option to save your key for future useage. See the function’s help file for more information.

Introduction: CRS and map projections

Student presentation time.

If you’re reading this after the fact, I recommend these two helpful resources. The very short version is that spatial data (like all coordinate-based systems) only make sense relative to some fixed point. That fixed point is what the Coordinate Reference Systems, or CRS, is trying to set. In R, we define the CRS with a “proj4string”, which is based on the syntax of the PROJ.4 geospatial library.

Similarly, whenever we try to plot (some part of) the earth on a map, we’re effectively trying to project a 3-D object onto a 2-D surface. This will necessarily create some kind of distortion. Different types of map projections limit distortions for some parts of the world at the expense of others. Distortions are particularly acute in the extreme latitudes for some projections, so try to choose according to the specific needs of your research question. For example, take a look at how badly the (in)famous Mercator projection distorts parts of the global map (source):

You can also “orient” your projection to a specific latitude and longitude using the PROJ.4 syntax. We’ll see some examples below, but first I’m obliged to share this XKCD summary. (Do yourself a favour and click on the link.)

Simple Features and the sf package

R has long provided excellent support for spatial analysis and plotting (primarily through the sp, rgdal, rgeos, and raster packages). However, until recently, the complex structure of spatial data necessitated a set of equally complex spatial objects in R. I won’t go into details, but a spatial object (say, a SpatialPolygonsDataFrame) was typically comprised of several “layers” — much like a list — with each layer containing a variety of “slots”. While this approach did (and still does) work perfectly well, the convoluted structure provided some barriers to entry for newcomers. It also made it very difficult to incorporate spatial data into the tidyverse ecosystem that we’re familiar with. Luckily, all this has changed thanks to the advent of the sf package.

The “sf” stands for simple features, which is a simple (ahem) standard for representing the spatial geometries of real-world objects on a computer.1 These objects — i.e. “features” — could include a tree, a building, a country’s border, or the entire globe. The point is that they are characterised by a common set of rules, defining everything from how they are stored on our computer to which geometrical operations can be applied them. Of greater importance to us here, however, is the fact that sf represents these features in R as data frames. This means that all of our favourite tidyverse operations can be applied to spatial data, in addition to some specialized functions that we’ll cover next.

Reading in spatial data

Somewhat confusingly, most of the functions in the sf package start with the prefix st_. This stands for spatial and temporal and a basic command of this package is easy enough once you remember that you’re probably looking for st_SOMETHING().2

Let’s demonstrate by reading in the North Carolina counties shapefile that comes bundled with the sf package. As you might have guessed, we’re going to use the st_read() command and the sf package will handle all the heavy lifting behind the scenes.

Simple Features as data frames

Let’s print out the “nc” object that we just created and take a look at its structure.

## Simple feature collection with 100 features and 14 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## epsg (SRID):    4267
## proj4string:    +proj=longlat +datum=NAD27 +no_defs
## First 10 features:
##     AREA PERIMETER CNTY_ CNTY_ID        NAME  FIPS FIPSNO CRESS_ID BIR74
## 1  0.114     1.442  1825    1825        Ashe 37009  37009        5  1091
## 2  0.061     1.231  1827    1827   Alleghany 37005  37005        3   487
## 3  0.143     1.630  1828    1828       Surry 37171  37171       86  3188
## 4  0.070     2.968  1831    1831   Currituck 37053  37053       27   508
## 5  0.153     2.206  1832    1832 Northampton 37131  37131       66  1421
## 6  0.097     1.670  1833    1833    Hertford 37091  37091       46  1452
## 7  0.062     1.547  1834    1834      Camden 37029  37029       15   286
## 8  0.091     1.284  1835    1835       Gates 37073  37073       37   420
## 9  0.118     1.421  1836    1836      Warren 37185  37185       93   968
## 10 0.124     1.428  1837    1837      Stokes 37169  37169       85  1612
##    SID74 NWBIR74 BIR79 SID79 NWBIR79                       geometry
## 1      1      10  1364     0      19 MULTIPOLYGON (((-81.47276 3...
## 2      0      10   542     3      12 MULTIPOLYGON (((-81.23989 3...
## 3      5     208  3616     6     260 MULTIPOLYGON (((-80.45634 3...
## 4      1     123   830     2     145 MULTIPOLYGON (((-76.00897 3...
## 5      9    1066  1606     3    1197 MULTIPOLYGON (((-77.21767 3...
## 6      7     954  1838     5    1237 MULTIPOLYGON (((-76.74506 3...
## 7      0     115   350     2     139 MULTIPOLYGON (((-76.00897 3...
## 8      0     254   594     2     371 MULTIPOLYGON (((-76.56251 3...
## 9      4     748  1190     2     844 MULTIPOLYGON (((-78.30876 3...
## 10     1     160  2038     5     176 MULTIPOLYGON (((-80.02567 3...

Now we can see the explicit data frame structure that was I talking about earlier. The “nc” object has the familar tibble-style output that we’re used to (e.g. it only prints the first 10 rows of the data). However, it also has some additional information in the header, like a description of the geometry type (“MULTIPOLYGON”) and the proj4string CRS (“+proj=longlat +datum=NAD27 +no_defs”). One thing I want to note in particular is the “geometry” column right at the end of the data frame. This geometry column is how the sf package achieves much of its magic: It stores the geometries of each row element in its own list column.3 Since all we really care about are the key feature attributes — county name, FIPS code, population size, etc. — we can focus on those instead of getting bogged down by hundreds (or thousands or even millions) of coordinate points. In turn, this all means that our favourite tidyverse operations and syntax (including the pipe operator %>%) can be applied to spatial data. Let’s review some examples, starting with plotting.

Plotting and projection with ggplot2

Plotting sf objects is incredibly easy thanks to the package’s integration with ggplot2.[Plotting sf maps using the base R plot function is also easy (and often faster). However, I feel that you give up a lot of control and intuition by moving away from the layered, “graphics of grammar” approach of ggplot2.] The key ggplot geom to remember is geom_sf(). I’m going to go ahead and load the whole tidyverse (i.e. not just ggplot2), because we’ll soon be using it for some data wrangling too.

To reproject an sf object to a different CRS, we can use st_transform().

## Simple feature collection with 2 features and 14 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -7160507 ymin: 4364302 xmax: -7077238 ymax: 4404757
## epsg (SRID):    NA
## proj4string:    +proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs
##    AREA PERIMETER CNTY_ CNTY_ID      NAME  FIPS FIPSNO CRESS_ID BIR74
## 1 0.114     1.442  1825    1825      Ashe 37009  37009        5  1091
## 2 0.061     1.231  1827    1827 Alleghany 37005  37005        3   487
##   SID74 NWBIR74 BIR79 SID79 NWBIR79                       geometry
## 1     1      10  1364     0      19 MULTIPOLYGON (((-7146002 43...
## 2     0      10   542     3      12 MULTIPOLYGON (((-7118113 43...

Or, we can specify a common projection directly in the ggplot call using coord_sf(). This is often the most convenient approach when you are combining multiple sf data frames in the same plot.

Data wrangling with dplyr and tidyr

As I keep saying, the tidyverse approach to data wrangling carries over very smoothly to sf objects. For example, the standard dplyr verbs like filter(), mutate() and select() all work:

## Simple feature collection with 3 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -79.01814 ymin: 35.85786 xmax: -75.95718 ymax: 36.55629
## epsg (SRID):    4267
## proj4string:    +proj=longlat +datum=NAD27 +no_defs
##          NAME  AREA AREA_1000 PERIMETER CNTY_ CNTY_ID  FIPS FIPSNO
## 1 Northampton 0.153       153     2.206  1832    1832 37131  37131
## 2      Camden 0.062        62     1.547  1834    1834 37029  37029
## 3      Durham 0.077        77     1.271  1908    1908 37063  37063
##   CRESS_ID BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79
## 1       66  1421     9    1066  1606     3    1197
## 2       15   286     0     115   350     2     139
## 3       32  7970    16    3732 10432    22    4948
##                         geometry
## 1 MULTIPOLYGON (((-77.21767 3...
## 2 MULTIPOLYGON (((-76.00897 3...
## 3 MULTIPOLYGON (((-79.01814 3...

You can also perform “group_by” and “summarise” operations as per normal (see here for a nice example). Furthermore, the dplyr family of join functions also work, which can be especially handy when combining different datasets by (say) FIPS code or some other attribute. However, this presumes that only one of the objects has a specialized geometry column. In other words, it works when you are joining an sf object with a normal data frame. In cases where you want to join two sf objects based on their geometries, there’s a specialized st_join() function. I provide an example of this latter operation in the section on geometric operations below.

And, just to show that we’ve got the bases covered, you can also implement your favourite tidyr verbs. For example, we can tidyr::gather() the data to long format, which is useful for facetted plotting. Here I demonstrate using the “BIR74” and “BIR79” columns (i.e. the number of births in each county in 1974 and 1979, respectively).

Specialized geometric operations

Alongside all the tidyverse functionality, the sf package comes with a full suite of geometrical operations. You should take a look at at the third sf vignette or the Geocomputation with R book to get a complete overview. However, here are a few examples to get you started:

So-called unary operations are applied to a single object. For instance, you can “melt” sub-elements of an sf object (e.g. counties) into larger elements (e.g. states) using st_union():

Or, you can get the st_area(), st_centroid(), st_boundary(), st_buffer(), etc. of an object using the appropriate command. For example:

## Units: [m^2]
## [1] 1137388604  611077263 1423489919  694546292 1520740530

And:

Another set of so-called binary operations can be applied to multiple objects. So, we can get things like the distance between two sf objects using st_distance().

Or, we can calculate the intersection of different spatial objects using st_intersection(). For this next example, I’m going to use two new spatial objects: 1) A regional map of France from the maps package and 2) the part of the Seine river network (including its Marne and Yonne tributaries) from the spData package. Don’t worry too much about the process used for loading these datasets; I’ll cover that in more depth shortly. For the moment, just focus on the idea that we want to see which adminstrative regions are intersected by the river network. Start by plotting all of the data to get a visual sense of the overlap:

Now let’s limit it to the intersected regions:

## Simple feature collection with 22 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 0.4931747 ymin: 47.04007 xmax: 5.407725 ymax: 49.52717
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
##                   ID  name                       geometry
## 2              Aisne Marne LINESTRING (3.608053 49.089...
## 39       Haute-Marne Marne LINESTRING (5.407725 47.877...
## 62             Marne Marne LINESTRING (4.872966 48.637...
## 81    Seine-et-Marne Marne LINESTRING (3.254238 48.977...
## 83 Seine-Saint-Denis Marne LINESTRING (2.595133 48.876...
## 88      Val-de-Marne Marne LINESTRING (2.53346 48.8581...
## 9               Aube Seine MULTILINESTRING ((4.499758 ...
## 21          Cote-Dor Seine LINESTRING (4.712657 47.512...
## 28           Essonne Seine LINESTRING (2.514207 48.575...
## 29              Eure Seine LINESTRING (1.514229 49.077...

Note that st_intersection() only preserves exact points of overlap. As in, this is the exact path that the rivers follow within these regions. We can see this more explicitly in map form:

If we instead wanted to plot the subsample of intersected provinces (i.e. keeping their full geometries), we have a couple options. We could filter the france object by matching its region IDs with the france_intersected object. However, a more direct option is to use the sf::st_join() function which matches objects based on overlapping (i.e. intersecting) geometries:

That’s about as much sf functionality as I can show you right now. The remaining part of this lecture will cover some additional mapping considerations and some bonus spatial R “swag”. However, I’ll try to slip in a few more sf-specific operations along the way.

Where to get map data

As our first North Carolina examples demonstrate, you can easily import external shapefiles, KML files, etc., into R. Just use the generic st_read() function on any of these formats and the sf package will take care of the rest. However, we’ve also seen with the France example that you might not even need an external shapefile. Indeed, R provides access to a large number of base maps — e.g. countries of the world, US states and counties, etc. — through the maps, (higher resolution) mapdata and spData packages, as well as a whole ecosystem of more specialized GIS libraries.4 To convert these maps into “sf-friendly” data frame format, we can use the st_as_sf() function as per the below examples.

And another digression on the Equal Earth projection

The latest and greatest projection, however, is the “Equal Earth” projection. Matt Strimas-McKay has a nice blog post on it, which you should read. In the interest of brevity, though here’s a bare-bones example. (NB: Note that you will need to upgrade your PROJ4 library to version 5.2.0 before trying the below code chunk.)

Looks great. Note that we used the rnaturalearth package to extract the countries spatial data frame. This looks very similar to our world1 data frame, but avoids some of the weird polygon mappings that can happen under certain projections; particularly when polygons extend over the Greenwich prime meridian.5

Example 2: A single country (i.e. Norway)

The maps and mapdata packages have detailed county- and province-level data for several individual nations. We’ve already seen this with France, but it includes the USA, New Zealand and several other nations. However, we can still use it to extract a specific country’s border using some intuitive syntax. For example, we could plot a base map of Norway as follows.

Hmmm. Looks okay, but we6 don’t really want to include Svalbaard (to the north) and some of the other smaller islands like the Faroes. We can use the sf::st_crop() function to take care of that. We could also improve the projection. The Norewgian Mapping Authority recommends the ETRS89 / UTM projection, for which we can easily obtain the PROJ.4 string using this website.

Aside: I always like to dettach the maps() package once I’m finished using it since it avoids potential namespace conflicts with purrr::map.

BONUS 1: US Census data with tigris and tidycensus

Working with Census data has traditionally quite a pain. You need to register on the website, then download data from various years or geographies separately, merge these individual files, etc. Thankfully, this too has recently bcome much easier thanks to the Census API and — for R at least — the tigris and tidycensus packages from Kyle Walker (a UO grad). This next section will closely follow a tutorial on his website.

NOTE: Before continuing with this section, you will first need to request an API key from the Census. Once that’s done, you can set it using the tidycensus::census_api_key() function. I recommend using the “install = T” option to save your key for future useage.

We start by loading the packages and setting some options for optimized use with the sf package.

Let’s say that our goal is to provide a snapshot of Census rental estimates across different cities in the Pacific Northwest. We start by downloading tract-level rental data for Oregon and Washington using the tidycensus::get_acs() function. Note that you’ll need to look up the correct ID variable (in this case: “DP04_0134”).

## Simple feature collection with 2292 features and 5 fields (with 4 geometries empty)
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -124.7631 ymin: 41.99179 xmax: -116.4635 ymax: 49.00249
## epsg (SRID):    4269
## proj4string:    +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## # A tibble: 2,292 x 6
##    GEOID  NAME       variable estimate   moe                       geometry
##    <chr>  <chr>      <chr>       <dbl> <dbl>             <MULTIPOLYGON [°]>
##  1 53001… Census Tr… DP04_01…      675    54 (((-118.9798 47.26171, -118.5…
##  2 53001… Census Tr… DP04_01…      885    91 (((-118.9821 46.95662, -118.9…
##  3 53001… Census Tr… DP04_01…      658   172 (((-119.3694 46.79559, -119.3…
##  4 53001… Census Tr… DP04_01…      681   128 (((-119.1791 46.83331, -119.1…
##  5 53001… Census Tr… DP04_01…      766   247 (((-119.1905 46.79951, -119.1…
##  6 53003… Census Tr… DP04_01…      770    44 (((-117.48 46.12199, -117.419…
##  7 53003… Census Tr… DP04_01…     1060   248 (((-117.4208 46.33866, -117.4…
##  8 53003… Census Tr… DP04_01…      658    85 (((-117.0722 46.42472, -117.0…
##  9 53003… Census Tr… DP04_01…      687    46 (((-117.0796 46.41444, -117.0…
## 10 53003… Census Tr… DP04_01…      673   113 (((-117.0608 46.40103, -117.0…
## # … with 2,282 more rows

This returns an sf object, which we can plot directly. Hmmm, looks like you want to avoid renting in Seattle if possible…

The above map is very detailed. However, perhaps we’re not interested in the entire set of tract level data, but would rather get a sense of rents within some well-defined metropolitan areas? The tigris package has you covered here. For example, let’s say we want to compare average rents across three Oregon/Washington metros: Portland (and surrounds), Corvallis and Eugene.

Now we do a spatial join on our two data sets using the sf::st_join() function.

## Simple feature collection with 589 features and 6 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -124.1587 ymin: 43.44001 xmax: -121.5144 ymax: 46.18897
## epsg (SRID):    4269
## proj4string:    +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## First 10 features:
##          GEOID                                          NAME  variable
## 85 53011040101 Census Tract 401.01, Clark County, Washington DP04_0134
## 86 53011040102 Census Tract 401.02, Clark County, Washington DP04_0134
## 87 53011040201 Census Tract 402.01, Clark County, Washington DP04_0134
## 88 53011040202 Census Tract 402.02, Clark County, Washington DP04_0134
## 89 53011040203 Census Tract 402.03, Clark County, Washington DP04_0134
## 90 53011040301 Census Tract 403.01, Clark County, Washington DP04_0134
## 91 53011040302 Census Tract 403.02, Clark County, Washington DP04_0134
## 92 53011040403 Census Tract 404.03, Clark County, Washington DP04_0134
## 93 53011040407 Census Tract 404.07, Clark County, Washington DP04_0134
## 94 53011040408 Census Tract 404.08, Clark County, Washington DP04_0134
##    estimate moe                          metro_name
## 85     1011 274 Portland-Vancouver-Hillsboro, OR-WA
## 86      681 572 Portland-Vancouver-Hillsboro, OR-WA
## 87      880 466 Portland-Vancouver-Hillsboro, OR-WA
## 88     1274 336 Portland-Vancouver-Hillsboro, OR-WA
## 89     1500 691 Portland-Vancouver-Hillsboro, OR-WA
## 90      973 142 Portland-Vancouver-Hillsboro, OR-WA
## 91     1422 123 Portland-Vancouver-Hillsboro, OR-WA
## 92     1338 332 Portland-Vancouver-Hillsboro, OR-WA
## 93     1046 115 Portland-Vancouver-Hillsboro, OR-WA
## 94     1101 314 Portland-Vancouver-Hillsboro, OR-WA
##                          geometry
## 85 MULTIPOLYGON (((-122.556 45...
## 86 MULTIPOLYGON (((-122.5217 4...
## 87 MULTIPOLYGON (((-122.7491 4...
## 88 MULTIPOLYGON (((-122.6462 4...
## 89 MULTIPOLYGON (((-122.6723 4...
## 90 MULTIPOLYGON (((-122.751 45...
## 91 MULTIPOLYGON (((-122.796 45...
## 92 MULTIPOLYGON (((-122.6755 4...
## 93 MULTIPOLYGON (((-122.5476 4...
## 94 MULTIPOLYGON (((-122.5996 4...

One useful way to summarize this data and compare across metros is with a histogram. Note that “regular” ggplot2 geoms and functions play perfectly nicely with sf objects (i.e. we aren’t limited to geom_sf()).

BONUS 2: Interactive maps with leaflet

Now that you’ve grasped the basic properties of sf objects and how to plot them using ggplot2, its time to scale up with interactive maps.7 You have several package options here, but I think the best are leaflet and plotly. We’ve already covered the latter in a previous lecture, so I’ll simply redirect interested parties to this link for some sf-specific examples. To expand on the former in more depth, leaflet is a lightweight JavaScript library for interactive mapping that has become extremely popular in recent years… and with good reason. The good people at RStudio have kindly packaged a version of leaflet for R, which basically acts as a wrapper to the underlying JavaScript library.

The leaflet syntax is a little different to what you’ve seen thus far and I strongly encourage you to visit the package’s excellent website for the full set of options. However, a key basic principle that it shares with ggplot2 is that you build your plot in layers. Here’s an example adapted from Julia Silge, which builds on the tidycensus package that we saw above. This time, our goal is to plot county-level population densities for Oregon as a whole and produce some helpful popup text if a user clicks on a particular county. First, we download the data using the tidycensus package and inspect the resulting data frame.

## Simple feature collection with 36 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -124.5662 ymin: 41.99179 xmax: -116.4635 ymax: 46.29204
## epsg (SRID):    4269
## proj4string:    +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## First 10 features:
##    GEOID                     NAME   variable estimate moe
## 1  41001     Baker County, Oregon B01003_001    15980  NA
## 2  41003    Benton County, Oregon B01003_001    88249  NA
## 3  41005 Clackamas County, Oregon B01003_001   399962  NA
## 4  41007   Clatsop County, Oregon B01003_001    38021  NA
## 5  41009  Columbia County, Oregon B01003_001    50207  NA
## 6  41011      Coos County, Oregon B01003_001    62921  NA
## 7  41013     Crook County, Oregon B01003_001    21717  NA
## 8  41015     Curry County, Oregon B01003_001    22377  NA
## 9  41017 Deschutes County, Oregon B01003_001   175321  NA
## 10 41019   Douglas County, Oregon B01003_001   107576  NA
##                          geometry
## 1  MULTIPOLYGON (((-118.5194 4...
## 2  MULTIPOLYGON (((-123.8167 4...
## 3  MULTIPOLYGON (((-122.8679 4...
## 4  MULTIPOLYGON (((-123.5989 4...
## 5  MULTIPOLYGON (((-123.3665 4...
## 6  MULTIPOLYGON (((-124.4626 4...
## 7  MULTIPOLYGON (((-121.1083 4...
## 8  MULTIPOLYGON (((-124.3239 4...
## 9  MULTIPOLYGON (((-122.0019 4...
## 10 MULTIPOLYGON (((-124.2145 4...

So, the popup text of interest is held within the “NAME” and “estimate” columns. I’ll use a bit of regular expression work to extract the county name from the “NAME” column (i.e. without the state) and then build up the map layer by layer. Note that the leaflet syntax requires that I prepend variables names with a tilde (~) when I refer to them in the plot building process. This tilde operates in much the same way as the asthetics (aes()) function does in ggplot2. One other thing to note is that I need to define a colour palette — which I’ll col_pal here — separately from the main plot. This is a bit of an inconvience if you’re used to the fully-integrated ggplot2 API, but only a small one.

No suprises here. The bulk of Oregon’s population is situated just west of the Cascades, in cities connected along the I-5.

A particularly useful feature of interactive maps is not being limited by scale or granularity, so you can really dig in to specific areas and neighbourhoods. Here’s an example using home values from Lane County.

## Getting data from the 2013-2017 5-year ACS

BONUS 3: Other map plotting options (plot, tmap, etc.)

While I think that ggplot2 (together with sf) and leaflet are the best value bet for map plotting in R — especially for relative newcomers that have been inculcated into the tidyverse ecosystem — it should be said that there are many other options available to you. The base R plot function is very powerful and handles all manner of spatial objects. (It is also extremely fast.) The package that I’ll highlight briefly in closing, however, is tmap. The focus of the tmap package is thematic maps that are “production ready”. The package is extremely flexible and can accept various spatial objects and output various map types (including interactive). Moreover, the syntax should look very familar to us, since it is inspired by ggplot2’s layered graphics of grammar approach. Here’s an example of a great looking map taken from the tmap homepage.

Further reading

You could easily spend a whole semester (or degree!) on spatial analysis and, more broadly, geocomputation. I’ve simply tried to give you as much useful information as can reasonably be contained in one lecture. Here are some resources for further reading and study:

  • The package websites that I’ve linked to throughout this tutorial are an obvious next port of call for delving deeper into their functionality: sf, leaflet, etc.
  • The best overall resource right now may be Geocomputation with R, a superb new text by Robin Lovelace, Jakub Nowosad, and Jannes Muenchow. This is a “living”, open-source document, which is constantly updated by its authors and features a very modern approach to working with geographic data. Highly recommended.
  • Similarly, the rockstar team behind sf, Edzer Pebesma and Roger Bivand, are busy writing their own book, Spatial Data Science. This project is currently less developed, but I expect it to become the key reference point in years to come. Imporantly, both of the above books cover raster-based spatial data.
  • On the subject of raster data… If you’re in the market for shorter guides, Jamie Afflerbach has a great introduction to rasters here. At a slightly more advanced level, UO’s very own Ed Rubin has typically excellent tutorial here. Finally, the sf team is busy developing a new package called stars, which will provide equivalent functionality (among other things) for raster data.

  1. See the first of the excellent sf vignettes for more details.

  2. I rather wish they’d gone with a sf_ prefix myself — or at least created aliases for it — but the package developers are apparently following standard naming conventions from PostGIS.

  3. For example, we could print out the coordinates needed to plot the first element in our data frame, Ashe county, by typing nc$geometry[[1]]. In contrast, I invite you to see how complicated the structure of a traditional spatial object is by running, say, str(as(nc, "Spatial")).

  4. The list of specialised maps packages is far too long for me to cover here. You can get marine regions, protected areas, nightlights, …, etc., etc.

  5. To see for yourself, try running the above chunk with the world1 object that we created earlier iinstead of countries. Remember to transform its CRS to the Equal Earth projection first.

  6. The royal “we”. The Dude abides.

  7. The ability to easily plot interactive maps from R is one of the main reasons that I switched all of my public presentations from PDFs to RMarkdown-driven HMTL.